Creating Language Resources for Nlp in Indian Languages
نویسندگان
چکیده
Non-availability of lexical resources in the electronic form is a major bottleneck for anyone working in the field of NLP on Indian languages. Some measures were taken to alleviate this bottleneck in a quick and efficient way. It was felt that if the development of these resources is linked with an example application then it can act as a test bed for the developing resources and provide constant feedback. Moreover, immediate results in terms of a performing system also enthuses the developers for such time consuming jobs. It was decided to take up the building of a machine translation system as an example application, which would also serve as a vehicle for building lexical resources.
منابع مشابه
Invited Talk: Breaking the Zipfian Barrier of NLP
We know that the distribution of most of the linguistic entities (e.g. phones, words, grammar rules) follow a power law or the Zipf's law. This makes NLP hard. Interestingly, the distribution of speakers over the world, content over the web and linguistic resources available across languages also follow power law. However, the correlation between the distribution of number of speakers to that o...
متن کاملTagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work
This paper presents on-going work on creating NLP tools for under-resourced languages from very sparse training data coming from linguistic field work. In this work, we focus on Ingush, a Nakh-Daghestanian language spoken by about 300,000 people in the Russian republics Ingushetia and Chechnya. We present work on morphosyntactic taggers trained on transcribed and linguistically analyzed recordi...
متن کاملŚata-Anuvādak : Tackling Multiway Translation of Indian Languages
We present a compendium of 110 Statistical Machine Translation systems built from parallel corpora of 11 Indian languages belonging to the Indo-Aryan and Dravidian families. We analyze the relationship between translation accuracy and the language families involved. We feel that insights obtained from this analysis will provide guidelines for creating machine translation systems for specific In...
متن کاملAn OLAC Extension for Dravidian Languages
OLAC was founded in 2000 for creating online databases of language resources. This paper intends to review the bottom-up distributed character of the project and proposes an extension of the architecture for Dravidian languages. An ontological structure is considered for effective natural language processing (NLP) and its advantages over statistical methods are reviewed
متن کاملShata-Anuvadak: Tackling Multiway Translation of Indian Languages
We present a compendium of 110 Statistical Machine Translation systems built from parallel corpora of 11 Indian languages belonging to the Indo-Aryan and Dravidian families. We analyze the relationship between translation accuracy and the language families involved. We feel that insights obtained from this analysis will provide guidelines for creating machine translation systems for specific In...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003